8 research outputs found

    Distributed Frequent Item Sets Mining over P2P Networks

    Get PDF
    Data intensive peer-to-peer (P2P) networks are becoming increasingly popular in applications like social networking, file sharing networks, etc. Data mining in such P2P environments is the new generation of advanced P2P applications. Unfortunately, most of the existing data mining algorithms do not fit well in such environments since they require data that can be accessed in its entirety. It also is not easy due to the requirements of online transactional data streams. In this paper, we have developed a local algorithm for tracing frequent item sets over a P2P network. The performance of the proposed algorithm is comparatively tested and analyzed through a series of experiments

    Efficient mining of Fuzzy Association Rules from the Pre-Processed Dataset

    Get PDF
    Association rule mining is an active data mining research area. Recent years have witnessed many efforts on discovering fuzzy associations. The key strength of fuzzy association rule mining is its completeness. This strength, however, comes with a major drawback to handle large datasets. It often produces a huge number of candidate itemsets. The huge number of candidate itemsets makes it ineffective for a data mining system to analyze them. In the end, it produces a huge number of fuzzy associations. This is particularly true for datasets whose attributes are highly correlated. The huge number of fuzzy associations makes it very difficult for a human user to analyze them. Existing research has shown that most of the discovered rules are actually redundant or insignificant. In this paper, we propose a novel technique to overcome these problems; we are preprocessing the data tuples by focusing on similar behaviour attributes and ontology. Finally, the efficiency and advantages of this algorithm have been proved by experimental results

    Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model

    Get PDF
    AbstractFrequent itemset mining from data streams is an important data mining problem with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. However, it is also a difficult problem due to the unbounded, high-speed and continuous characteristics of streaming data. Therefore, extracting frequent itemsets from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient algorithm, called Max-FISM (Maximal-Frequent Itemsets Mining), for mining recent maximal frequent itemsets from a high-speed stream of transactions within a sliding window. According to our algorithm, whenever a new transaction is inserted in the current window only its maximum itemset should be inserted into a prefix tree-based summary data structure called Max-Set for maintaining the number of independent appearance of each transaction in the current window. Finally, the set of recent maximal frequent itemsets is obtained from the current Max-Set. Experimental studies show that the proposed Max-FISM algorithm is highly efficient in terms of memory and time complexity for mining recent maximal frequent itemsets over high-speed data streams

    To Better Handle Concept Change and Noise: A Cellular Automata Approach to Data Stream Classification

    No full text
    Abstract: A key challenge in data stream classification is to detect changes of the concept underlying the data, and accurately and efficiently adapt classifiers to each concept change. Most existing methods for handling concept changes take a windowing approach, where only recent instances are used to update classifiers while old instances are discarded indiscriminately. However this approach can often be undesirably aggressive because many old instances may not be affected by the concept change and hence can contribute to training the classifier, for instance, reducing the classification variance error caused by insufficient training data. Accordingly this paper proposes a cellular automata (CA) approach that feeds classifiers with most relevant instead of most recent instances. The strength of CA is that it breaks a complicated process down into smaller adaptation tasks, for each a single automaton is responsible. Using neighborhood rules embedded in each automaton and emerging time of instances, this approach assigns a relevance weight to each instance. Instances with high enough weights are selected to update classifiers. Theoretical analyses and experimental results suggest that a good choice of local rules for CA can help considerably speed up updating classifiers corresponding to concept changes, increase classifiers ’ robustness to noise, and thus offer faster and better classifications for data streams